Freeform SQL

Freeform SQL allows you to apply a complex condition across multiple tables or columns. You can also use it when you need more flexibility or customization of your rules.

Note  Use the built-in Spark functions for non-Pushdown connections. Native SQL will not work, unless you switch to Native SQL rules. See (https://spark.apache.org/docs/2.3.0/api/sql/) for Simple and Freeform SQL rules.

Important Column name references on the Spark dataframe do not necessarily match the column names.

Individual statement

Syntax

Copy
SELECT * FROM @<dataset_name> <table_alias>
WHERE <filter_expression>
GROUP BY <group_by_expression>
HAVING <having_expression>

The base of the statement is given with @<dataset_name> style. In general, the <dataset_name> is the same, where the rule is attached to, but you can generally use any valid dataset name in the expression.

Examples

Simple rule expression

Copy
opt.dataset = "example_ds"

val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleNm("is_city_not_null_or_empty")
rule.setRuleValue("select * from @example_ds t where t.amount > '5000'")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")

Complex rule expression

Copy
opt.dataset = "unique_rule_ds"

val rule = RuleBll.createRule(opt.dataset)
rule.setRuleNm("unique_rule")
rule.setRuleValue("select * from ( select count(*) as cnt, customer_id from @unique_rule_ds group by customer_id ) having cnt > 1")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")

RegExp expression

Copy
opt.dataset = "regexp_rule_ds"

val rule = RuleBll.createRule(opt.dataset)
rule.setRuleNm("LIKE_rule")
rule.setRuleValue("select * from @regexp_rule_ds.SYMBOL rlike '^ABB+'  ")
rule.setRuleType("SQLG")
rule.setPerc(0.02)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")

Join statements

Available join types between multiple datasets

  • WHERE tableA.id = tableB.id style
  • INNER JOIN
  • LEFT <OUTER> JOIN
  • RIGHT <OUTER> JOIN

Joining other datasets

A @t1 rule is an out-of-the-box rule type that creates a dataset using the previous run’s date as the run date to read data from a multi-day run period so that data from the current and previous days can be evaluated.

To get the historical state of the same dataset:

Syntax: @t<n>, where the parameter n means, the historical run (number) of the base dataset (marked with @<dataset_name>) to instruct the temporary job created at runtime to target. The following table demonstrates what each of the following @tn value examples mean:

@tn value Description
@t1 Loads the data from the previous scan.
@t2 Loads the data from the second to last scan.
@t3 Loads the data from the third to last scan.
@t4 Loads the data from the fourth to last scan

To get a different dataset:
Syntax: @<other_dataset_name>

Note When you use a rule with @t1 syntax, a temporary job in the Jobs queue appears. This temporary job displays a LOAD status to allow the run that the @t1 is targeting to load and use it for the rule.

Cross-Connection Libraries

Note When applying cross-connection rules please use the -addlib to submit the job with the appropriate jar files. In this example, a secondary set of jars is added through the Explorer. These files are located in the /opt/owl/drivers/mysql directory. The path should not contain double quotes or single quotes. It should point to a directory without spaces in the path.

WHERE style

Lookback dataset

Copy
SELECT * FROM @<dataset_name> <table_alias>, @t1 [<history_table_alias>]
WHERE <join_expression> AND <filter_expression>

Example

Copy
opt.dataset = "example_ds"

val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleValue("select * from @example_ds t, @t1  where t.customer_id = t1.customer_id  and t.card_number <> t1.card_number ")
rule.setRuleType("SQLF")

Different dataset

Copy
SELECT * FROM @<dataset_name> <table_alias>, @<other_dataset_name> [<other_alias>]
WHERE <join_expression> AND <filter_expression>

Example

Copy
opt.dataset = "example_ds"
opt2.dataset = "other_ds"

val rule = RuleBll.createRule(opt.dataset)
rule.setRuleValue("select * from @example_ds t, @other_ds ds2 where t.customer_id = ds2.customer_id  and t.card_number <> ds2.card_number ")
rule.setRuleType("SQLF")

LEFT JOIN

Example

Copy
opt.dataset = "example_ds"

val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleNm("not_back2back_days")
rule.setRuleValue(" select * from @example_ds A LEFT OUTER JOIN @t1 B ON A.customer_id = B.customer_id where A.customer_id is not null and B.customer_id is null  ")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")

Known Limitations

Freeform rules with fully qualified column names are currently unsupported when they use the following syntax:

select <column name> FROM @<dataset name> WHERE @<dataset name>.<column name> condition

A possible workaround to this limitation is to use aliasing instead.